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Executive Summary 


In preparation for online administration of the ACT® test, ACT conducted studies to examine 
the comparability of scores between online and paper administrations, including a timing study 
in fall 2013, a mode comparability study in spring 2014, and a second mode comparability 
study in spring 2015. This report presents major findings from these studies, focusing on the 
mode comparability studies. 


Fall 2013 Timing Study 

Standard paper administration of the ACT allows 45, 60, 35, and 35 minutes for the English, 
mathematics, reading, and science tests, respectively. The purpose of the timing study was to 
evaluate whether online administration of the ACT would require different time limits than the 
paper administration. 


The four tests were administered online to approximately 3,000 examinees, with each 
examinee responding to one test. Students were randomly assigned to take the test under one 
of three timing conditions: the current paper time limit, the current time limit plus five minutes, 
and the current time limit plus ten minutes. At the end of the test, the students were also given 
a survey with questions regarding their testing experience, including whether or not they felt 
they had enough time to finish the test. 


Students’ item and test level scores, item omission rates, item and test latency information, 
and student survey results were analyzed using a variety of methods, both descriptive and 
inferential. Results suggested that online scores on the reading and science tests would more 
likely be comparable to paper administration scores with an increase in testing time, given the 
delivery system and conditions at the time. Because of the potential confounding of motivation 
and familiarity with the online testing format in the timing study, a decision was made to 
tentatively increase online testing time for the reading and science tests by five minutes and 
continue to evaluate the timing issue in the subsequent mode comparability studies. 


Spring 2014 Mode Comparability Study 

To gather additional information about the differences between testing modes (i.e., online vs. 
paper) and to learn about administration issues, ACT conducted a mode comparability study 
in an operational testing environment where participating students received college-reportable 
scores. Therefore, it was imperative that scores reported across modes be comparable. To 
ensure this was the case, a randomly equivalent groups design was implemented, allowing 
equating methodology to be used to adjust for score differences across modes. The purposes 
of the mode comparability study were to: (1) investigate the comparability of the scores from 
the two testing modes; (2) obtain interchangeable scores across modes for operational score 
reporting; (3) re-evaluate the timing decisions for the online administration of the reading and 
science tests; and (4) gain insights into the online administration process. 


Students participating in the spring 2014 study could choose to register for the ACT with or 
without the writing test. Within the group of students taking the ACT with writing and within the 
group taking it without writing, students were randomly assigned to take one of the three forms 
(one paper and two online) that were administered in the study. The assignment was similar 
to distributing spiraled paper booklets. After the administration, survey questions were sent to 
students who participated in the study to ask for their comments and feedback on their testing 
experience. 


More than 7,000 students from about 80 schools across the country signed up for this study. 
Data were cleaned based on reviews of the proctor comments, phone logs, irregularity reports, 
latency information, and an examination of the random assignment. Students with invalid 
scores and test centers with large discrepancies in form counts across modes were excluded 
from further analyses. 


Analyses were conducted to investigate mode comparability at two levels: score equivalency 
and construct equivalency. These two levels were differentiated by some researchers (e.g., 
Lottridge, Nicewander, Schulz, & Mitzel, 2008), but were used here mainly for the convenience 
of organizing analyses. Score equivalency was examined in terms of the similarity of test 
score distributions between the two modes, such as means, standard deviations, and relative 
cumulative frequency distributions. For the English, mathematics, reading, and science tests, 
the similarity of item score distributions, such as the item p-values, item response distributions, 
and item omission rates were compared. In addition, measurement precision (i.e., reliability 
and conditional standard error of measurement) was compared across modes, and the item 
latency information for the online test items was also examined. The ACT writing scores were 
examined conditional on examinees’ English scores. Construct equivalency was examined 

by comparing the dimensionality and factor loadings, and by examining differential item 
functioning (DIF) between online and paper scores. 


Results showed that although little difference was found between the two modes in terms of 
test reliability, correlations among tests, effective weights, and factor structures, item scores 
and test scores tended to be higher and omission rates tended to be lower for the online group 
than for the paper group, especially for the reading and science tests. Equating methodology 
was used for all four multiple-choice tests to adjust for the differences to ensure that the 
college reportable scores of students participating in the mode comparability study were 
comparable to national test takers, regardless of the testing mode. Based on the findings from 
this spring 2014 mode comparability study, a decision was made to eliminate the extra five 
minutes for the online reading and science tests for the spring 2015 mode comparability study. 
Refinements in the delivery of the online assessments may be one of the factors contributing to 
the different recommendations of online test time limits for the two mode comparability studies. 


Spring 2015 Mode Comparability Study 

The mode comparability study in spring 2015 used the same data collection design as the 
spring 2014 study. The main purposes of this second mode comparability study were to further 
examine the comparability between online and paper scores and the impact of eliminating the 
extra five minutes for the reading and science online tests. More than 4,000 students from 
more than 40 schools signed up to participate in this study. One paper form and two online 
forms were administered. Students who participated in the 2015 study all took the re-designed 
ACT writing test, which was to be launched in fall 2015. Since the spring 2015 study followed 
the same design as the 2014 study, similar analyses were conducted for the four multiple- 
choice tests. 


Results showed that students performed similarly across modes on the science test but still 
higher on the online reading test even without the extra five minutes. Equating methodology 
was applied to produce comparable scores regardless of the testing mode. For the two 
prompts included in the writing mode study, students performed similarly across modes on one 
prompt but differentially on the other. Score distributions of randomly equivalent groups from a 
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subsequent online administration, also conducted in spring 2015, were examined as a further 
validation of online form conversions obtained from the mode comparability study. 


Evidence for Paper and Online ACT® 
Comparability: Spring 2014 and 2015 
Mode Comparability Studies 


Introduction 


As part of the initial development process of delivering the ACT online, ACT conducted several 
special studies. A timing study was conducted in fall 2013 to help inform the time limits for 
online administration, followed by a mode comparability study conducted in spring 2014, anda 
second mode comparability study in spring 2015. This report presents the designs, statistical 
analyses, and major findings of these studies, focusing on the mode comparability studies. 


Transferring test items from paper booklets to computer for online delivery is more complicated 
than it might appear (Leeson, 2006; Mutler, 1996; Parshall, Spray, Kalohn, & Davey, 2002; 
Pommerich, 2004; Schroeders & Wilhelm, 2011). If score equivalence is sought between 
online and paper versions of a test, careful decisions need to be made not only to optimize 

the presentation of items, but also to minimize mode effects so that potential interference with 
students’ performance can be eliminated to the extent possible, and the differential test taker 
performance between online and paper versions of the test due to differences in test mode can 
be negated. 


To best achieve both maximum comparability to the paper version and optimal online interface 
and delivery, an iterative process was adopted by ACT when developing the online delivery 
system for the ACT. That is, to aid the online version development process, studies were 
conducted to evaluate the comparability of scores from online and paper delivery of the 

ACT under various conditions of the online delivery system/design and to inform decisions 
about revisions of the online version to be evaluated in further studies. In addition, due 

to the intended high-stakes uses of the ACT test scores, if a study involved operational 

score reporting, scores were adjusted for students participating in the study using equating 
methodology. 


Fall 2013 Timing Study 


Standard paper administration of the ACT allows 45, 60, 35, and 35 minutes for the English, 
mathematics, reading, and science tests, respectively. To inform timing decisions about the 
online administration, a study was undertaken in fall 2013 to evaluate the online experience, 
such as whether scrolling passages would require more time. 


Data and Design 


Online versions of the four multiple-choice tests were administered to approximately 

3,000 examinees from about 58 schools, with each examinee taking one of the tests. Each test 
was administered under three conditions: the current paper time limit, the current time limit plus 
five minutes, and the current time limit plus ten minutes. The tests with the different time limits 
were randomly assigned to students. At the end of the test, students were also given a survey 
with questions regarding their testing experience, including whether or not they felt they had 
enough time to finish the test. Depending on their testing time limit, they would receive different 
amounts of time for the survey with a different number of questions so that all students were 
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engaged in the task for the same amount of time. The three testing time limits and the four 
tests produced 12 different combinations of study conditions with about 250 examinees in each 
condition. 


Statistical Analyses and Results 

The representativeness of the schools participating in the timing study was evaluated by 
comparing these schools’ earlier ACT test scores with other samples. Table 1 presents the 
means and standard deviations (SDs) of the ACT scores for three different samples: all 
operational data from the 2012-13 ACT testing year, 2012-13 ACT operational data from 
only those schools participating in the fall 2013 timing study, and all data from the 2013 ACT 
equating study. The fact that these schools’ average performance on the ACT in the previous 
year was just slightly higher than the national average and similar to the equating sample 
average provided some support for the representativeness of the timing study samples in 
terms of overall academic achievement. 


Table 1. Timing Study Participating Schools Compared with National and Equating 
Samples on the ACT 


Only Students from 


Test All 2012-13 Timing Study Schools All 2013 
Operational Data in 2012-13 Equating Data 
Operational Data 
N Mean SD N Mean SD N Mean SD 
English 3,342,127 20.66 6.37 9,656 21.51 6.05 31,553 21.26 5.57 
Mathematics 3,342,422 21.09 5.25 9,655 21.70 4.87 31,553 21.79 4.71 
Reading 3,340,291 21.36 6.12 9,653 21.89 6.00 31,553 22.28 5.57 
Science 3,338,369 20.99 5.18 9,652 21.41 4.93 31,553 = 21.93 4.42 


Item and test level scores, item omission rates, item and test latency information, and student 
survey results were analyzed using a variety of methods, both descriptive and inferential. The 
results from a few of the timing study analyses are presented below. 


Table 2 contains the percentages of students omitting zero to three or more items under 

the three timing conditions (i.e., current, plus 5 minutes, or plus 10 minutes). More students 
omitted three or more items for the reading and science tests under the current timing 
condition; however, with five or ten more minutes added, the percentage of students omitting 
three or more items was substantially reduced. 


Table 2. Percentage of Students Omitting Zero to Three or More Items for Fall 2013 


Test # of Omissions Ss 
Current Plus 5 Minutes Plus 10 Minutes 

) 67.53 73.88 71.75 

English 1 13.28 13.81 16.73 
(75 items) 2 2.95 3.36 4.09 
3+ 16.28 8.95 7.41 

) 63.43 69.61 70.18 

Mathematics 1 13.43 14.71 13.30 
(60 items) 2 2.78 3.43 3.21 
3+ 20.40 12.25 13.33 

) 54.19 63.64 71.72 

Reading 1 7.88 7.39 8.08 
(40 items) 2 1.48 1.70 2.02 
3+ 36.46 27.27 18.24 

0) 61.03 75.00 82.89 

Science 1 9.93 10.45 9.89 
(40 items) 2 1.84 1.49 2.28 


3+ 27.25 13.04 4.94 
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Figure 1 presents the item p-values, that is, the percentages of students who answered each 
item correctly under each of the timing conditions for each test. The items are ordered along 
the horizontal axis by their position in the test. For the English and mathematics tests, the 
p-values were similar across the three timing conditions, indicating that extending testing time 
did not have much effect on students’ performance on the test items. However, for the reading 
and science tests, higher p-values were observed for tests with additional time, especially for 


items near the end of the test. 


P-value 


1.0- 
0.8 - 
0.6 - 
0.4- 


0.2 - 


Reading Science 


1.0- 
0.8 - 
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Item Position 
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Figure 1. Item p-values by item position for the three timing conditions for fall 2013 


Figure 2 presents the percentage of omissions for each item of the four tests. Again, the items 
are ordered by their position in the test, and the percentage of examinees not responding 

to the item is given on the vertical axis. The graphs show that the percentage of examinees 
omitting items near the end of the tests was much higher for the reading and science tests 
than those for the English and mathematics tests. This was especially true for the reading test 
where the omission rate reached 40% for the last item under the current paper timing limit. 


40 - 
30 - 
20 - 
10 - 
a. 


Reading Science 


Percent 


0 15 30 45 60 75 0 15 30 45 60 75 


Item 
Plus 5 
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Figure 2. Percentage of students omitting items under the three timing conditions for 
fall 2013 
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Figure 3 shows the percentages of responses to the survey question regarding the students’ 
level of agreement with the statement that they had enough time to finish the test for each 
timing condition of each test. The response rates are shown at the top of each plot (e.g., 
response rate RR = 82.46% for the English test with current time limit). The “Other” category 

in the pie charts included the percentage of students who strongly disagreed or disagreed 

with the statement. In general, students who took the reading and science tests had larger 
percentages of disagreements on the statement that they had enough time to finish the test. 
As testing time increased, those disagreement percentages were reduced. However, they were 
still higher than those for the English and mathematics tests under the same timing condition. 
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Figure 3. Student responses to the survey question about if they had enough time 
under the three timing conditions for fall 2013 


Online Timing Recommendations and Concerns 


Though results from the fall 2013 timing study suggested that online administration might 
require more time for students to complete the reading and science tests, acting upon these 
results to establish timing recommendations was confounded with issues such as motivation 
and familiarity with the online testing format. For example, while the reading and science tests 
showed speededness in these analyses, it was also true that fewer examinees taking those 
two tests reported watching the orientation videos about how the online testing worked based 
on the responses to a survey question. (See Table 3.) 


Table 3. Survey Results for the Question Regarding Online Tutorial Video 
for Fall 2013 


Before taking the test, did you watch the online video about learning to use 
the online testing system? 


% English Mathematics Reading Science 
Yes 41 39 14 15 
No 59 61 86 85 


The final decision was to tentatively increase testing time for the reading and science online 
tests by five minutes and to revisit the time limit issue in the mode comparability studies. 
Because equating methodology had been planned in case of evidence suggesting mode 
differences, comparable scores for the examinees in the mode comparability studies could be 
ensured regardless of possible changes in administration time in the future. 


Mode Comparability Studies 


Two mode comparability studies for the ACT were conducted, one in spring 2014 and one in 
spring 2015. As in the fall 2013 timing study, the content of the test items for the online version 
and the paper version of a test form was intended to be exactly the same. However, there were 
some differences. For example, to assist students in marking the correct row of an answer 
document on the paper administration, some item choices run A, B, C, and D and some run 

F, G, H, and J. For online, these item choices may all runA, B, C, and D. In addition, some 
improvements had been made to the online test delivery system based on experiences and 
feedback from the fall 2013 timing study and the spring 2014 mode comparability study. In the 
spring 2014 study, the testing time for the online and paper administrations was the same for 
the English and mathematics tests, but it was different for the reading and science tests. Five 
additional minutes were added to the online versions of the reading and science tests based 
on the recommendation from the fall 2013 timing study. However, for the spring 2015 study, the 
testing time for both online and paper administration was kept the same based on the findings 
from the spring 2014 mode comparability study. 


The purposes of the mode comparability studies were to: (1) investigate the comparability of 
the ACT scores from the online and paper testing modes; (2) obtain interchangeable scores 
across modes for operational score reporting; (3) re-evaluate the timing decisions for the online 
administration of the ACT; and (4) gain additional insights about the online administration 
process. 
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Design 

Arandomly equivalent group’s design was used for the ACT mode comparability studies. 
Students were randomly assigned to take one of the three ACT forms that were administered 
in each study—two online forms (delineated as Online_1 and Online_2 in tables and figures) 
and the paper version of one of the two online forms (delineated as Paper_1 in tables and 
figures). One purpose for having an additional online form was to help evaluate the extent of 
the mode effect relative to form differences. 


These studies took place in operational testing environments on one of the ACT national 

test dates. Schools with sufficient numbers of computers that met ACT requirements for 
online testing were recruited to participate in the studies. Participating schools were also 
required to meet all the other requirements of ACT test centers. Online testing occurred on 
school-provided desktop or laptop computers (Windows/Macintosh). Tablets did not meet the 
requirements for the mode comparability studies. 


Students from participating schools registered for testing via the normal ACT registration 
process and were randomly assigned to take the online or paper version of the ACT. Since 
students did not know which mode they were going to be assigned until the day of testing, a 
student tutorial video and practice test intended to help students navigate the online testing 
system was made available to all participating students in advance of testing, in addition to 

the standard paper practice test. After testing, survey questions were sent to students who 
participated in the studies to ask for their comments and feedback on their testing experiences. 


Procedure 

For the multiple-choice tests, though the content of items on the online versions was intended 
to be exactly the same as the paper ones, there could still be differences in the appearance 

of items between the two modes. These differences could include text font, page size, 

page layout, graphics, and others. Before conducting data analyses, ACT first examined 
comparability of the online and paper versions of the tests through a qualitative comparison of 
the items in the paper booklets and those in the online version. All substantial differences were 
documented. 


Mode comparability was examined at two levels for the multiple-choice tests: score 
equivalency and construct equivalency. Score equivalency indicates that observed score 
distributions from the two modes are very similar for the two randomly equivalent groups. 
Construct equivalency indicates that the two modes are measuring the same underlying 
abilities or attributes. If score equivalency holds, construct equivalency is partly supported, 
especially given the items were the same for the two modes. However, score equivalency 
cannot guarantee construct equivalency, so additional comparisons to determine whether the 
two modes measure the same construct are still needed. 


Analyses to evaluate mode comparability were carried out in two phases for the multiple- 
choice tests. Phase | analyses focused more on score equivalency, examining the similarity 
of test score distributions between the two modes, such as means, standard deviations, and 
relative cumulative frequency distributions. The similarity of item score distributions, such as 
the item p-values, item response distributions, and item omission rates, were also compared. 


Equating methodology was used to ensure that the college reportable scores for students 
participating in the studies across modes were comparable. Timing decisions were also re- 


evaluated based on the new evidence gathered from the previous study (i.e., spring 2014 study 
based on fall 2013 findings and spring 2015 study based on spring 2014 results). 


Phase I] mode comparability analyses focused more on construct equivalency as well as some 
additional analyses, including item and test comparisons based on item response theory (IRT), 
factor analysis, differential item functioning (DIF), generalizability analysis, and evaluation of 
measurement precision after any mode effect was adjusted through equating methodology. 


The following sections present the specifics and results from the two mode comparability 
studies. Results from the Phase | and Phase II analyses are shown first, after which results for 
the ACT writing test are discussed. Presented at the end are some results of the online forms 
obtained from an additional online administration, which occurred shortly after the spring 2015 
mode comparability study. 


Spring 2014 Comparability Study 

Students could register to take either the ACT with writing or the ACT without writing for the 
spring 2014 comparability study. Random assignment of students to the online and paper 
forms was done separately for students taking the ACT with writing and those without writing 
so the groups taking the writing test would also be randomly equivalent. 


Data. More than 7,000 students from about 80 schools across the country signed up for the 
spring 2014 mode comparability study. As expected, not everyone actually showed up on the 
day of testing. Computer issues, power outages, and other problems also prevented some 
students from testing on the scheduled dates. Those students were rescheduled to take the 
test on paper on another date, and their scores were not included in the analyses. Data were 
also cleaned based on reviews of proctor comments, phone logs, irregularity reports, and 
latency information. Centers with large discrepancies in form counts were deleted from the 
analyses. 


All subsequent analyses were based on the final cleaned data. More than 5,500 students with 
at least 1,800 students for each form were included in the spring 2014 cleaned data. Among 
those students, over 2,000 took the writing test, either paper or online. 


Phase | Mode Comparability Analyses and Results for Multiple-Choice Tests. Phase | 
comparability analyses for the multiple-choice tests included an examination of the test and 
item level score distributions, test score reliability, and item omission rates across modes. 
Table 4 presents the sample size of each test form in the spring 2014 study as well as the 
means, standard deviations (SDs), minimums, and maximums of the observed total raw scores 
and scale scores for all three forms. Online Form 1 and paper Form 1 contained the same 
items, and were the focus of most analyses. They are simply referred to as the online and 
paper form when online Form 2 is not involved in the comparison. 


Note that the scale scores mentioned in the Phase | analyses refer to scale scores obtained 

by applying the paper raw to scale score conversions regardless of testing mode. The purpose 
of doing so was to examine the mode effect on the scale scores. For example, in Table 4, 

the scale score descriptive statistics for Online_1 and Online_2 were obtained by applying 

the paper version conversions of Form 1 and Form 2, respectively. However, final reported 
scale scores for the online forms were based on the conversions obtained through equating 
methodology that is discussed later. For the spring 2014 data, the online Form 1 scores tended 
to be higher on average than the paper Form 1 scores for all tests. Though online Form 2 had 
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higher raw score means than those of online Form 1 (except reading), their scale score means 
were similar. 


Table 4. Descriptive Statistics of Raw and Scale Scores of all Test Forms 
for Spring 2014 


=awe race Raw Score Scale Score 
N Mean SD Min Max Mean SD Min Max 
Online_1_ English 1801 45.04 13.94 7 75 21.39 5.95 5 36 
Mathematics 1801 31.38 11.63 7 60 21.30 5.26 11 36 
Reading 1801 25.17 7.65 2 40 23.56 6.43 4 36 
Science 1801 22.57 7.46 4 40 22.12 5.23 8 36 
Paper_1 English 1987 42.87 14.50 10 75 20.47 6.12 7 36 
Mathematics 1987 30.80 11.49 7 60 21.02 5.15 11 36 
Reading 1987 22.62 7.89 3 40 21.47 6.43 5 36 
Science 1987 21.14 7.26 2 40 21.14 5.03 5 36 
Online_2 English 1805 49.57 14.27 6 75 21.01 5.95 4 36 
Mathematics 1805 34.04 12.40 5 60 21.46 5.06 10 36 
Reading 1805 24.32 7.66 4 40 23.36 6.26 7 36 
Science 1805 22.86 7.83 4 40 21.95 5.39 8 36 


Raw and Scale Score Mean Differences, Effect Sizes, and t-tests of Mean Differences. 
Scores across modes can be compared either on raw scores or scale scores by applying 

the paper version conversions to both the paper and online forms. Since there are more raw 
score points than scale score points for the ACT, comparability at the raw score level is a more 
stringent requirement than comparability at the scale score level. However, only differences 

at the scale score level could have any practical impact because decisions are usually made 
based on scale scores. 


Figures 4 and 5 are graphical presentations of the raw score and scale score mean differences 
across modes. Mean differences, effect sizes, and p-values from t-tests of mean differences 
for raw and scale scores are presented in Table 5. The effect sizes were calculated by dividing 
the mean differences by the pooled standard deviations across modes for each test. As shown 
in Table 5, in spring 2014, the online group tended to have higher mean scores than the paper 
group for all tests. The mean differences were all statistically significant for raw scores and 
scale scores, except for the mathematics test. 
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Figure 4. Raw score mean comparisons across modes for spring 2014 
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Figure 5. Scale score mean comparisons across modes for spring 2014 


Table 5. Raw and Scale Score Mean Differences across Modes (Online minus 
Paper) for Spring 2014 


Raw Score Comparison Scale Score Comparison 
Mean Mean 
Test Difference Effect Size t-test p Difference Effect Size t-test p 
English 2.17 0.15 <.0001 0.93 0.15 <.0001 
Mathematics 0.58 0.05 0.1204 0.28 0.05 0.0942 
Reading 2.56 0.33 <.0001 2.09 0.32 <.0001 
Science 1.43 0.19 <.0001 0.98 0.19 <.0001 
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Raw and Scale Score Cumulative Frequency Distributions and Kolmogorov-Smirnov 
Test of Equivalency of Distributions. Raw score and scale score frequency distributions 
and cumulative frequency distributions were also compared across modes. The plots of the 
relative cumulative frequency distributions of proportion correct raw scores and scale scores 
are shown in Figures 6 and 7. For spring 2014, scores tended to be higher for the online group 
than for the paper group for all tests except the mathematics test. 
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Figure 6. Relative cumulative frequency distributions of proportion correct raw scores 
for spring 2014 
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Figure 7. Relative cumulative frequency distributions of scale scores for spring 2014 


The Kolmogorov-Smirnov (KS) test of equivalency of distributions was conducted for the 
raw and scale scores for each test. As with the results of the t-tests of mean differences, the 
KS tests showed that the across-mode raw and scale score distributions were statistically 
significant for all but the mathematics test. 


Correlations, Effective Weights, and Cronbach’s Alpha. Correlations among tests and 
effective weights of each test were also calculated to examine whether relationships between 
tests were consistent across modes. Measurement precision of scores from the two modes 
was examined by calculating Cronbach’s alpha. Reported in Table 6 are the scale score 
correlations, effective weights, and Cronbach’s alpha. These values were all very similar 
across modes. 
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Table 6. Scale Score Correlations, Effective Weights, and Cronbach's Alpha 
for Spring 2014 


Online Paper 


English Mathematics Reading Science English Mathematics Reading Science 


English 1.00 1.00 

Mathematics 14 1.00 15 1.00 

Reading 82 69 1.00 81 67 1.00 

Science 17 80 16 1.00 76 79 74 1.00 
.26 22 .28 23 .28 22 .28 22 
93 92 87 86 93 92 .87 86 
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p-values, Omission Rates, and Option Analyses. |tem difficulties (p-values) were computed 
for each mode. Figure 8 displays the p-value differences (online minus paper) across modes, 
with positive differences indicating that the item was easier for the online administration. 
Figure 8 shows that the items tended to be easier for the online administration, especially for 
items that appeared later in the test. 


Consistent with the effect size differences observed in Table 5, the p-value differences were 
the smallest for the mathematics test and the largest for the reading test in spring 2014. For 
the mathematics test, the item p-value differences were mostly within the range of -0.05 to 
0.05, and the direction of the differences seemed to vary randomly. For English and science, 
items in the latter part of the test were consistently easier for the online administration, and for 


reading, almost all items were easier for the online administration. 


? 
dle 

eee lili 
WT Ve HHL 
Il lt IM 


tr 

oe lt 

MM lilt 
« a Pill I 
PAPAL MII eI 


P-value Difference (Online minus Paper) 


0 15 30 45 60 75 0 15 30 45 60 75 


Item Position 


Difference ——e— Negative — -e — Postive 


Figure 8. Needle plots of p-value differences across modes for spring 2014 


The omission rate (i.e., the proportion of missing responses) for each item was also computed 
and the differences of omission rate across modes were compared. Figure 9 shows the 
omission rate differences (online minus paper) across modes, with positive differences 
indicating that the item had a larger omission rate for the online administration. As shown in 
Figure 9, across all four tests, the paper group consistently had a higher omission rate than the 
online group for the latter half of the tests, except the last few items of the mathematics test. 

In addition, the proportion of examinees choosing the incorrect options was also examined for 
each item across mode, but no obvious patterns were found. 
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Figure 9. Needle plots of omission rate differences across modes for spring 2014 


Adjustments for Mode Differences. Due to the differences observed between the online and 
paper scores as discussed above, equating methodology was used for the multiple-choice 
tests so that the college reportable scores are comparable regardless of conditions (i.e., mode 
and time limit) under which students took the tests. Consistent with the methodology used for 
equating paper forms of the ACT, the equipercentile method with post smoothing was used 

to link the online test scores to the paper form scores, using a randomly equivalent groups 
design. 


Equating methodology adjusted for the potential mode effects for each test and created raw to 

scale score conversion tables for the online forms that were different from the corresponding 

paper conversions. These conversions are referred to as online conversions or adjusted 

conversions to differentiate them from the paper conversions. Figure 10 shows the raw to scale 

score conversions for the two online forms together with their counterpart paper conversions. 15 
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Figure 11 displays the differences between the online and paper conversions at each raw 
score point for the two online forms, with negative values indicating that the same raw score 
was converted to a lower scale score in the online conversion than in the paper conversion. 
For spring 2014, except for a few raw score points for Form 2 English and mathematics, the 
online conversions, after adjusting for mode effect, resulted in equal or lower scale scores than 


the paper conversions. 
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Figure 10. Raw to scale score conversions for spring 2014 
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Figure 11. Scale score differences for spring 2014 


Two sets of scale scores were calculated for the students who took the online tests by applying 
both the online conversions and the paper conversions. The differences between these scale 
scores (online minus paper scale score) were also calculated for the two online forms. 

Figures 12 and 13 present the distributions of these difference scores for those two forms. 

For spring 2014, the majority of the differences (around 50% to nearly 100%) were zero 

or one score point for English, mathematics, and science. For reading, however, more than 
half of the difference in scores was two scale score points or more. 
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Figure 12. Distributions of scale score adjustment for the online Form 1 for 
spring 2014 
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Figure 13. Distributions of scale score adjustment for the online Form 2 for 
spring 2014 


Online Timing Re-evaluation. As mentioned earlier, in the spring 2014 mode comparability 
study the online administration added five minutes to the current paper administration time 
for the reading and science tests based on the recommendations from the fall 2013 timing 
study. However, the limitations of that timing study made it necessary to continue to gather 
information to inform the timing decisions. Since the mode comparability studies were 
conducted in an operational testing environment with a paper control group, the studies 
provided information for timing decisions that were less confounded than that of the fall 2013 
timing study. Results from analyses presented in previous sections were considered together 
with the following additional information from the student survey and online item latency 
information to inform the online timing decisions in spring 2014. 


Survey Results on Timing-Related Questions. |n the student survey, students were asked 
whether they felt they had enough time to finish each of the tests. About 1,500 students, 
approximately two thirds of whom took the online versions of the tests, completed the survey 
in spring 2014. Table 7 presents the spring 2014 survey results related to this question. 
Except for writing, a higher percentage of students either agreed or strongly agreed that 
they had enough time to finish the test for the online administration compared to the paper 
administration. 
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Table 7. Percentage of Responses to the Timing Related Question for Spring 2014 


I felt | had Neither Either 
enough time Agree Agreeor’ Either Disagree 
to finish Strongly nor Strongly Strongly or Strongly 
the...test Agree Agree Disagree Disagree Disagree Agree Disagree 
Online 
English 40 39 7 9 4 79 12 
Mathematics 21 32 13 24 9 53 33 
Reading 21 34 12 21 10 55 31 
Science 15 32 17 23 11 47 34 
Writing 19 28 16 18 14 48 32 
Paper 
English 27 36 11 18 7 62 25 
Mathematics 14 34 12 25 14 49 38 
Reading 11 26 12 30 18 37 48 
Science 9 29 15 29 17 38 45 
Writing 29 42 11 9 7 71 17 


Online Form Item Response Time. ltem latency information was examined for the two online 
forms. Figure 14 presents the average time spent on each item for all four multiple-choice 
tests. If the time spent on the last few items of each test was significantly less than on the other 
items, the test may be speeded. However, no such evidence was found for the online tests. 
Note that the peaks shown in the graphs are usually the first item associated with a passage, 
which included the time spent on reading the passage. 
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Figure 14. Average time spent on each item for spring 2014 


Online Timing Decision. Based on the results of the spring 2014 mode comparability 
analyses, student survey, and item latency information, it was decided that the extra five 
minutes for the online administration of the reading and science tests should be removed, 
resulting in the same testing time regardless of whether the test is administered online or on 
paper. 


Phase II Mode Comparability Analyses and Results for Multiple-Choice Tests 


IRT Analysis. Mode effects were examined under item response theory (IRT) at both the 

test and item levels by comparing the test characteristic curves (TCCs) and item parameters 
across modes, using the three-parameter logistic IRT model. Figure 15 contains plots of the 
TCCs across modes for each subject. Consistent with the patterns observed in Figures 6 and 7 
for the raw and scale score relative cumulative frequency distributions, the across-mode TCC 
difference is the smallest for the mathematics test, but largest for reading in the spring 2014 
study. 
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Figure 15. Test characteristic curves across modes for spring 2014 


Scatter plots of item parameter estimates from online and paper are presented in 

Figures 16-18. Consistent with the comparison of item p-values for spring 2014, the 
b-parameter comparison showed that the online items tended to be easier than the paper 
items, especially for the reading and science tests. In addition, the c-parameters tended to be 
higher for the online items, which indicated that low-performing students had a higher chance 
of answering the online items correctly than the paper items. 
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Figure 16. IRT a-parameter comparison across modes for spring 2014 
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Figure 17. IRT b-parameter comparison across modes for spring 2014 
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Figure 18. IRT c-parameter comparison across modes for spring 2014 


Factor Analysis. Exploratory factor analysis was conducted to explore the dimensionality and 
construct equivalency of the online and paper tests. Eigenvalue scree plots for each test were 


examined across modes, as shown in Figure 19. 
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Figure 19. Eigenvalue scree plot for spring 2014 


Eigenvalue 


The data were fit with both a one-factor and a two-factor model. Table 8 presents the criteria 
used for evaluating model fit, and Table 9 contains several fit indices resulting from fitting 

the one- and two-factor models for the four multiple-choice tests across modes. Table 9 also 
includes the fit statistic differences (DIFF) in fitting two- or one-factor models. The bolded 
numbers in Table 9 are values that did not meet the criteria presented in Table 8. As shown 

in Table 9, all statistics for the spring 2014 data indicated good model fit for the one-factor 
model except for a couple of statistics for the online reading test. Compared with the one-factor 
model, the use of the two-factor model did not seem to improve the model fit substantially 
except for the online reading test. However, based on the principle of parsimony, the one-factor 
model was considered to be adequate and the factor loadings of each test for the one-factor 
model were compared across modes. Table 10 presents the descriptive statistics of the factor 
loadings of each mode and the correlations of the factor loadings across the two modes. 


Table 8. Criteria for Good Model Fit 


Fit Statistic Value 
CFI >=0.95 
TLI >=0.95 
RMSEA <=0.06 
SRMR <=0.08 


Table 9. Fit Statistics of One- and Two-Factor Models for Spring 2014 


Online Paper 
Test Fit Statistic 
One Factor TwoFactors DIFF OneFactor TwoFactors DIFF 
CFI 0.97 0.98 0.01 0.96 0.98 0.02 
TLI 0.97 0.98 0.01 0.96 0.98 0.02 
English 
RMSEA 0.03 0.03 0.01 0.04 0.03 0.01 
SRMR 0.05 0.05 0.01 0.06 0.05 0.01 
CFI 0.97 0.99 0.02 0.95 0.99 0.03 
TLI 0.97 0.99 0.02 0.95 0.99 0.03 
Mathematics 
RMSEA 0.03 0.02 0.01 0.04 0.02 0.02 
SRMR 0.06 0.04 0.02 0.06 0.04 0.02 
CFI 0.98 0.99 0.02 0.93 0.98 0.05 
TLI 0.98 0.99 0.02 0.92 0.98 0.06 
Reading 
RMSEA 0.03 0.01 0.01 0.05 0.03 0.02 
SRMR 0.04 0.03 0.01 0.07 0.04 0.03 
CFI 0.98 0.99 0.01 0.96 0.98 0.03 
TLI 0.98 0.99 0.01 0.95 0.98 0.03 
Science 
RMSEA 0.02 0.02 0.01 0.03 0.02 0.01 
SRMR 0.05 0.04 0.01 0.05 0.04 0.01 


Table 10. Descriptive Statistics and Correlation of Factor Loadings across Modes 


for Spring 2014 

Test Mode Mean sD Minimum Maximum Correlation 

English Online 0.51 0.11 0.25 0.70 88 
Paper 0.52 0.11 0.26 0.75 

Mathematics Online 0.53 0.10 0.25 0.72 .90 
Paper 0.52 0.11 0.26 0.70 

Reading Online 0.50 0.12 0.22 0.73 87 
Paper 0.49 0.11 0.26 0.76 

Science Online 0.48 0.13 0.23 0.71 87 
Paper 0.47 0.11 0.26 0.69 


Generalizability Analysis. Raw score reliability was further examined based on the results 


from a multivariate generalizability analysis under a person-crossed-with-item design, treating 


the different content categories as different variables. The generalizability coefficients and 


dependability indices or phi coefficients are reported in Table 11, together with the Cronbach’s 


alpha reliability already reported in Table 6 to facilitate comparison. The phi coefficients were 


slightly lower than the generalizability coefficients, and these two coefficients were both very 


close to the alpha estimates. Similar with alpha, reliability indices from the generalizability 


analyses showed barely any differences across modes. 
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In addition, correlations between the content areas, the variance components of each facet, 
and the contribution of each content category to the total variance from the generalizability 
analysis results were also compared across modes. No noteworthy differences were found. 


Table 11. Raw Score Generalizability Coefficient, Phi Coefficient, and Alpha for 
Spring 2014 


Online Paper 


English Mathematics Reading Science English Mathematics Reading Science 


Generalizability 0.93 0.92 0.88 0.86 0.93 0.92 0.88 0.86 
Phi 0.92 0.91 0.87 0.85 0.93 0.91 0.87 0.84 
Alpha 0.93 0.92 0.87 0.86 0.93 0.92 0.87 0.86 


Differential ltem Functioning. The purpose of conducting differential item functioning (DIF) 
analyses was to examine whether some items function differently across modes for examinees 
at the same overall proficiency level on the test and, if so, whether sources of that difference 
can be identified. Recall that a qualitative content comparison was made for items across 
modes, which was used as a basis for judging the practical significance of the statistically 
identified items. 


The qualitative comparison documented differences across modes that may affect student 
performance. For example, one general difference was that the online version line breaks of 
passages, stems, and options were usually different from the paper version, but this probably 
would not have any effect on students’ performance. Other differences might or might not 
affect performances. For example, the paper version might have the entire passage or entire 
set of tables and figures visible on a single page whereas online might need scrolling. The 
online version used highlighting, but paper used underlying or reference to line numbers. Items 
that were potentially affected by these differences were identified. 


The item p-value differences as presented in Figure 8 as well as omission rate comparisons 
in Figure 9 all contributed to the evaluation of item DIF, because the groups were randomly 
equivalent. In addition, DIF was examined by using the Mantel-Haenszel procedure 

(Camilli & Shepard, 1994; Mantel & Haenszel, 1959). 


The Mantel-Haenszel procedure calculates the weighted average of the odds-ratios across 

all score levels. In this study, items with odds ratio values smaller than 0.5 or larger than 2 
were flagged for further review. When controlling for raw or scale scores before applying 

the equating methodology, one reading, one science, and two English items were flagged. 

A few more items were flagged when controlling for scale scores after using the equating 
methodology to adjust for mode effects. Flagged items were those with the largest p-value 
differences, almost always favoring the online mode. A comparison of the statistically identified 
items and what was documented in the qualitative comparison did not reveal any concrete 
sources of DIF for these items. 


Scale Score Moments and Measurement Precision After Applying Equating 
Methodology. Scale score properties, including reliability, standard error of measurement 
(SEM), and conditional SEM were examined across modes and across the online forms. These 
properties were investigated based on Lord’s (1965) four parameter beta compound binomial 
model (Kolen, Hanson, & Brennan, 1992). Scale scores for the online forms were obtained by 
using equating methodology to adjust for any mode effects. 


Table 12 presents the scale score moments, SEM, and reliability of each form. Mode effects 


can be examined by comparing Online_1 and Paper_1 in light of the form differences that 


can be obtained by comparing Online_1 and Online_2. Scale score descriptive statistics were 


similar across the one paper and two online forms after applying equating methodology for 


all four subjects. English and mathematics had slightly higher scale score reliability and lower 


SEM than reading and science. In general, the scale score reliability and SEM were very close 


between paper and online. In cases where slight differences existed, the differences between 


modes tended to be smaller than the differences between the two online forms. 


Table 12. Scale Score Moments, SEM, and Reliability for Spring 2014 


Test Mean SD Skewness_ Kurtosis SEM Reliability 
Online_1 20.47 6.13 0.32 2.63 1.76 0.92 
English Paper_1 20.47 6.12 0.29 2.62 1.71 0.92 
Online_2 20.43 6.12 0.26 2.61 1.72 0.92 
Online_1 21.01 5.21 0.58 2.60 1.57 0.91 
Mathematics Paper_1 21.02 5.15 0.56 2.59 1.58 0.91 
Online_2 21.02 5.18 0.57 2.57 1.45 0.92 
Online_1 21.48 6.46 0.30 2.35 2.35 0.87 
Reading Paper_1 21.47 6.43 0.32 2.37 2.29 0.87 
Online_2 21.56 6.49 0.32 2.36 2.41 0.86 
Online_1 21.15 5.12 0.41 3.28 2.05 0.84 
Science Paper_1 21.14 5.03 0.40 3.29 2.11 0.82 
Online_2 21.07 5.07 0.40 3.24 1.94 0.85 


Figure 20 contains plots of the conditional SEM of each true scale score point for all three 


forms. Differences in conditional SEM were pretty small among the three forms, and mode 


differences tended to be smaller than the differences between the two online forms. 
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Conditional SEM 
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Figure 20. Conditional SEM for spring 2014 


ACT Writing Test. In spring 2014, the writing test was holistically rated with a score range of 
2 to 12 and a testing time limit of 30 minutes. More than half of the students in the spring 2014 
final data sample took the ACT with writing. The mode effect for the writing test was examined 
by comparing the online and paper writing mean scores and by comparing the conditional 
writing scores after controlling for the English scale scores. 


Table 13 presents the descriptive statistics, mean differences, effect sizes, and t-test p-values 
between paper and online for writing and for English. Though random assignment of the online 
and paper forms was done within the group of students who registered for the ACT with writing, 
group equivalency might be affected by data cleaning. The purpose of including English scale 
scores (based on the online conversions) was to obtain additional evidence for the equivalency 
of the two groups taking the online versus the paper writing test. The effect size of between- 
mode group difference for English was small, and the t-test of mean difference was not 
statistically significant at the .05 level, providing additional evidence for the equivalency of the 
two groups for the writing test mode comparison. The small effect size and the relatively large 
t-test p-value for writing indicated that mode effects were not significant for the writing test in 
the spring 2014 special study. 


Table 13. Across Mode Comparisons for Students Taking the ACT Writing Test 
for Spring 2014 


Online Paper 
Mean Effect 
N Mean SD N Mean SD Difference Size _ t-testp 
English 1059 21.58 6.37 1255 21.31 6.24 0.27 0.04 0.29 
Writing 1059 7.26 1.74 1255 7.22 1.57 0.04 0.02 0.57 


The ACT writing scores were also examined by comparing the score distributions across 
modes conditioning on the English scale scores after adjusting for the mode effects, that 

is, the paper form applying the paper conversions and the online form applying the online 
conversions. Figure 21 includes a scatter plot of the online and paper writing scores against 
students’ English scale scores, and Figure 22 shows the conditional mean writing scores for 
each mode. Though there seemed to be a weak trend that the conditional online mean scores 
were slightly lower than the paper mean scores for lower English scores but slightly higher 
than paper means for higher English scores, the magnitude of the differences was small for 
most of the English scale score points. Since no evidence of significant mode effect for the 
writing test was found, no adjustment was made for mode to the ACT writing scores in 


spring 2014. 
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Figure 21. Scatter plot of writing online and paper scores conditioning on English 
scale scores for spring 2014 
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Figure 22. Average writing online and paper scores conditioning on English scale 
scores for spring 2014 


Spring 2015 Mode Comparability Study 

A second mode comparability study was conducted in spring 2015. A similar data collection 
design and procedure was used for the spring 2015 study as was used for the spring 2014 
study. Students were randomly assigned to take one of the three ACT forms (Online_1, 
Online_2, or Paper_1) that were administered in an operational testing setting on one of the 
ACT national test dates. Online testing was conducted on school-provided desktop or laptop 
computers. 


There were two main differences between the spring 2014 and 2015 mode comparability 
studies. For the spring 2015 study, all the online tests had the same time limits as their paper 
counterparts. The extra five minutes added to the online reading and science tests in the 
spring 2014 study were eliminated based on that study’s findings. The second difference 

was that different versions of the ACT writing test were administered in these two mode 
comparability studies. The writing test administered in the spring 2015 study was the enhanced 
version that was to be operationally launched in fall 2015. The writing test was analytically 
scored with the administration time changed from 30 to 40 minutes. The enhanced version 

of the writing test launched in September 2015 reported four domain scores and a writing 
scale score ranging from 1 to 36, though only students’ domain scores were reported back 
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to schools in this special study. The participants in the 2015 study only came from those who 
registered for the ACT without writing but agreed to participate in a special writing study without 
receiving college reportable scores. Students took the writing test either online or on paper. 
Together with the multiple-choice tests, two paper and two online prompts were randomly 
assigned among students participating in the study. 


Data. More than 4,000 students from about 40 schools signed up for the spring 2015 study. 
After data cleaning, more than 3,000 students with at least 1,000 records for a form were 
included in the spring 2015 final data. A two-phase analysis similar to that done for the 

spring 2014 study was conducted for the spring 2015 mode comparability study. The following 
sections present the results for the multiple-choice tests and the writing test. 


Phase | Mode Comparability Analyses and Results for Multiple-Choice Tests. Phase | 
analyses focused on test score distributions and the similarity of item level scores across 
modes. Table 14 contains the sample size of each test form in the spring 2015 study as well 
as the means, standard deviations (SDs), minimums, and maximums of the observed total raw 
scores and scale scores for all three forms. The scale score descriptive statistics for Online_1 
and Online_2 were obtained by applying the paper version conversions of Form 1 and Form 2, 
respectively. On average the online Form 1 scores tended to be higher than the paper Form 1 
scores for all tests. Though online Form 2 had higher raw score means than those of online 
Form 1 (except reading and science), their scale score means were similar. 


Table 14. Descriptive Statistics of Raw and Scale Scores of all Test Forms 
for Spring 2015 


— aaa Raw Score Scale Score 
N Mean SD Min Max Mean SD Min Max 
English 1092 43.62 14.10 9 74 20.79 5.98 6 36 
; Mathematics 1092 30.02 11.76 5 60 20.69 5.20 10 36 
Pe Reading 1092 23.28 7.59 4 40 21.99 6.24 rg 36 
Science 1092 20.73 7.43 2 40 20.86 5.17 5 36 
English 1056 41.26 14.43 5 74 19.79 6.03 4 36 
Mathematics 1056 29.74 11.78 5 60 20.58 5.16 10 36 
haiies Reading 1056 22.00 7.57 2 40 20.91 6.08 4 36 
Science 1056 20.72 7.20 3 40 20.80 4.96 6 36 
English 1044 44.54 14.83 4 75 20.46 6.27 3 36 
; Mathematics 1044 30.24 11.46 7 58 20.82 5.05 12 35 
ome Reading 1044 22.66 8.00 3 40 21.77 6.14 5 36 
Science 1044 20.30 7.03 3 39 21.36 5.00 6 36 


Figures 23 and 24 provide graphical presentations of the raw score and scale score mean 
differences across modes. Mean differences, effect sizes, and p-values from t-tests of 

mean differences for raw and scale scores are presented in Table 15. The effect sizes were 
calculated by dividing the mean differences by the pooled standard deviations across modes 
for each test. In the spring 2015 study, students who took the online tests still performed 
better than the students who took the paper tests. However, the mean differences were only 
statistically significant for the English and reading tests. 
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Figure 23. Raw score mean comparisons across modes for spring 2015 
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Figure 24. Scale score mean comparisons across modes for spring 2015 


Table 15. Raw and Scale Score Mean Differences across Modes (Online minus 


Paper) for Spring 2015 
Raw Score Comparison Scale Score Comparison 
Test Mean Mean 
Difference Effect Size t-test p Difference Effect Size t-test p 
English 2.36 0.17 0.0001 1.00 0.17 0.0001 
Mathematics 0.28 0.02 0.5808 0.11 0.02 0.6199 
Reading 1.28 0.17 <.0001 1.08 0.18 <.0001 


Science 0.01 0.00 0.9742 0.06 0.01 0.7717 
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The plots of the relative cumulative frequency distributions of proportion correct raw scores 
and scale scores are shown in Figures 25 and 26. For spring 2015, the online group seemed 
to score higher than the paper group only for the English and reading tests. The differences 
between the online and paper groups for the English test appeared to be similar for the 
spring 2014 and 2015 studies, while the differences for the reading test tended to be smaller 
for the spring 2015 study than that for spring 2014. The Kolmogorov-Smirnov (KS) test of 
equivalency of distributions was conducted for the raw and scale scores for each test. For the 
spring 2015 study, the KS tests were statistically significant for the English and reading tests, 
but not for the mathematics and science tests. 
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Figure 25. Relative cumulative frequency distributions of proportion correct raw 
scores for spring 2015 
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Figure 26. Relative cumulative frequency distributions of scale scores for 
spring 2015 


Table 16 contains the scale score correlations, effective weights, and Cronbach’s alpha 
reliability for the online and paper tests. These values were all very similar across modes. 


Table 16. Scale Score Correlations, Effective Weights, and Cronbach’s Alpha 


for Spring 2015 
Online Paper 
English Mathematics Reading Science English Mathematics Reading Science 

English 1.00 1.00 
ae Mathematics 15 1.00 15 1.00 

Reading 81 68 1.00 .80 66 1.00 

Science 15 .80 74 1.00 16 19 12 1.00 
Effective Weight 27 .23 .28 .23 .28 .23 27 22 
Cronbach's Alpha .93 92 86 86 93 92 86 85 
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Item difficulties (p-values) were computed for each mode. Figure 27 shows the p-value 
differences (online minus paper) across modes, with a positive difference indicating that 

the item was easier for the online administration. Figure 27 shows that the items tended to 

be easier for the online administration, especially for items that appeared later in the test. 
Consistent with the results shown in Table 15, the mathematics and science tests showed 
smaller mode differences in item p-values compared with the English and reading tests. The 
items toward the end of the English and reading tests tended to be easier for online. Compared 
with spring 2014 results, the mode differences on the reading and science tests were smaller 
in the spring 2015 study, probably due to the removal of the extra five minutes for these two 
tests. 
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Figure 27. Needle plots of p-value differences across modes for spring 2015 


Figure 28 shows the omission rate (i.e., the proportion of missing responses) difference (online 
minus paper) for each item across modes. The paper group consistently had a higher omission 
rate than the online group for the latter half of the tests across all four tests, except for the last 
few items of the mathematics test. The omission rate was slightly higher for spring 2015 than 
for spring 2014 across all four tests. In addition, the proportion of examinees choosing the 
incorrect options was also examined for each item across mode, but no obvious patterns were 
found. 
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Figure 28. Needle plots of omission rate differences across modes for spring 2015 


Adjustments to Score Differences. Equating methodology was used to generate raw to 
scale score conversion tables for the online forms that were different from the corresponding 
paper conversions. The raw to scale score conversions for the two online forms together with 
their counterpart paper conversions are shown in Figure 29. Figure 30 displays the differences 
between the online and paper conversions at each raw score point for the two online forms, 
with negative values indicating that the same raw score was converted to a lower scale score 
in the online conversion than in the paper conversion. 
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Figure 29. Raw to scale score conversions for spring 2015 
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Figure 30. Scale score differences for spring 2015 


Scale score differences were computed as online minus paper scores by applying online or 
paper conversions. Figures 31 and 32 present distributions of the difference scores for the 
two online forms. For spring 2015, only a small portion of the difference scores surpassed one 
score point for the English and reading tests. Almost all the differences were zero or one score 
point for mathematics and science. 
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Figure 31. Distributions of scale score adjustment for the online Form 1 for 
spring 2015 
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Figure 32. Distributions of scale score adjustment for the online Form 2 for 
spring 2015 


Online Timing Re-evaluation. Time limits for the online tests were re-evaluated in the 
spring 2015 study by also examining student survey data and online test item response time. 
In spring 2015, 490 students responded to the survey question about whether they felt they 
had enough time to finish each of the tests, with about 68% of the students taking the tests 
online. Table 17 contains the survey results for this question. Larger percentages of students 
felt they had enough time to finish the online tests than those who took the paper tests. This 
was true for all five subjects. 
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Table 17. Percentage of Responses to the Timing Related Question for Spring 2015 


| felt | had Neither Either Agree Either Disagree 
enough time to Strongly Agree nor Strongly or Strongly or Strongly 
finish the...test Agree Agree Disagree Disagree Disagree Agree Disagree 
Online 
English 35 38 9 12 6 73 18 
Mathematics 15 34 13 23 15 49 38 
Reading 14 23 11 34 18 37 52 
Science 12 17 18 31 22 29 53 
Writing 39 33 15 8 5 72 13 
Paper 
English 18 35 11 24 12 53 36 
Mathematics 7 30 12 32 19 37 52 
Reading 5 18 10 43 25 22 68 
Science 4 23 19 28 26 27 54 
Writing 28 31 20 13 8 59 21 


Figure 33 presents the average time spent on each item for all four multiple-choice tests. 

No evidence of speededness was found for the online tests. The patterns of time spent 

on items were quite similar between the spring 2014 study and the spring 2015 study. The 
comparability of scores between online and paper with removing the five extra minutes in 
online administration of the reading and science tests was further evaluated in spring 2015, 
which confirmed the decision that the same administration time should be used for online and 
paper versions of all tests. 
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Figure 33. Average time spent on each item for spring 2015 


Phase II Mode Comparability Analyses and Results for Multiple-Choice Tests. Similar 
Phase I] mode comparability analyses were conducted for the spring 2015 mode comparability 
study. Figure 34 contains plots of the test characteristic curves (TCCs) across modes for each 
subject. Consistent with the patterns observed in Figures 25 and 26 for the raw and scale 
score relative cumulative frequency distributions, the between-mode TCC difference was 
smallest for the mathematics and science tests. There were some differences in TCC for the 
English and reading tests. Scatter plots of item parameter estimates from online and paper are 
presented in Figures 35-37. The b-parameter comparison showed that the online items tended 
to be easier than the paper items, especially for the reading test. Online items had higher 
c-parameter values. 
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Figure 34. Test characteristic curves across modes for spring 2015 
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Figure 35. IRT a-parameter comparison across modes for spring 2015 
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Figure 36. IRT b-parameter comparison across modes for spring 2015 
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Figure 37. IRT c-parameter comparison across modes for spring 2015 
Exploratory factor analysis was conducted to explore the dimensionality and construct 


equivalency of the online and paper tests. Eigenvalue scree plots for each test were examined 
across modes, as shown in Figure 38. The data were fit with both a one-factor and a two-factor 


model. 
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Figure 38. Eigenvalue scree plot for spring 2015 


Eigenvalue 


Table 18 contains several fit indices resulting from fitting the one- and two-factor models for 
the four multiple-choice tests across modes. The bolded numbers are values that did not meet 
the criteria presented in Table 8, showing that a couple of the fit statistics for the reading and 
science online tests were not optimal, and the two-factor model seemed to improve the mode 
fit for those tests. However, based on the principle of parsimony, the one-factor model was 
considered to be adequate and the factor loadings of each test for the one-factor model were 
compared across modes. Table 19 presents the descriptive statistics of the factor loadings of 
each mode and the correlations of the factor loadings between the two modes. 


Table 18. Fit Statistics of One- and Two-Factor Models for Spring 2015 


Online Paper 
Test Fit Statistic 
One Factor TwoFactors DIFF OneFactor Two Factors DIFF 
CFI 0.96 0.98 0.02 0.96 0.98 0.03 
TLI 0.96 0.98 0.02 0.96 0.98 0.03 
English 
RMSEA 0.03 0.03 0.01 0.04 0.03 0.01 
SRMR 0.06 0.05 0.01 0.07 0.05 0.01 
CFI 0.97 0.99 0.02 0.97 0.99 0.02 
TLI 0.97 0.99 0.03 0.97 0.99 0.02 
Mathematics 
RMSEA 0.04 0.02 0.02 0.03 0.02 0.02 
SRMR 0.06 0.05 0.02 0.06 0.05 0.02 
CFI 0.94 0.98 0.05 0.90 0.98 0.07 
TLI 0.93 0.98 0.05 0.90 0.97 0.07 
Reading 
RMSEA 0.04 0.02 0.02 0.05 0.03 0.03 
SRMR 0.07 0.05 0.02 0.08 0.05 0.03 
CFI 0.97 0.99 0.02 0.94 0.98 0.04 
TLI 0.97 0.99 0.02 0.94 0.98 0.04 
Science 
RMSEA 0.03 0.02 0.01 0.04 0.02 0.01 
SRMR 0.05 0.05 0.01 0.07 0.05 0.02 


Table 19. Descriptive Statistics and Correlation of Factor Loadings across Modes 
for Spring 2015 


Test Form Mean sD Minimum Maximum Correlation 
English Online 0.51 0.12 0.24 0.72 a6 
Paper 0.52 0.12 0.22 0.76 
Mathematics Online 0.53 0.12 0.19 0.74 
Paper 0.54 0.11 0.28 0.69 = 
Reading Online 0.48 0.12 0.19 0.74 re 
Paper 0.47 0.13 0.19 0.74 
Science Online 0.47 0.13 0.19 0.71 
Paper 0.46 0.12 0.15 0.70 i 


Generalizability coefficients and dependability indices or phi coefficients are reported in Table 


20. Similar to the alpha results, reliability indices from the generalizability analyses showed 


barely any differences across modes. 


Table 20. Raw Score Generalizability Coefficient, Phi Coefficient, and Alpha 


for Spring 2015 


Online Paper 
English Mathematics Reading Science English Mathematics Reading Science 
Generalizability 0.93 0.92 0.87 0.86 0.93 0.93 0.87 0.86 
Phi 0.92 0.91 0.86 0.84 0.92 0.91 0.86 0.83 
Alpha 0.93 0.92 0.86 0.86 0.93 0.92 0.86 0.85 
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The Mantel-Haenszel procedure was used to flag DIF items that needed to be further 
reviewed. Three English items were flagged for DIF based on raw scores, and a few more 
items were flagged when controlling for scale scores after using the equating methodology. 
One reading and one science item was flagged based on scale scores. No concrete sources of 
DIF for those items were identified. 


Table 21 presents the scale score moments, SEM, and reliability of each form. Figure 39 
contains plots of the conditional SEM of each true scale score point for all three forms. 


Table 21. Scale Score Moments, SEM, and Reliability for Spring 2015 


Test Mean SD Skewness__ Kurtosis SEM Reliability 
Online_1 19.79 6.06 0.32 2.71 1.75 0.92 
English Paper_1 19.79 6.03 0.27 2.69 1.71 0.92 
Online_2 19.76 6.11 0.31 2.72 1.67 0.93 
Online_1 20.65 5.18 0.44 2.44 1.61 0.90 
Mathematics Paper_1 20.58 5.16 0.44 2.43 1.59 0.91 
Online_2 20.60 5.12 0.40 2.33 1.63 0.90 
Online_1 20.93 6.09 0.27 2.44 2.26 0.86 
Reading Paper_1 20.91 6.07 0.29 2.54 2.28 0.86 
Online_2 21.01 6.18 0.31 2.47 2.16 0.88 
Online_1 20.82 5.06 0.23 3.27 2.14 0.82 
Science Paper_1 20.80 4.96 0.25 3.13 2.11 0.82 
Online_2 20.84 5.05 0.24 3.17 2.17 0.82 
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Figure 39. Conditional SEM for spring 2015 


ACT Writing Test. Two writing prompts were administered along with one paper and two 
online forms of the ACT in the spring 2015 mode comparability study. The online versions of 
the two writing prompts were spiraled within the examinees taking the two online ACT forms, 
and the paper versions of the two prompts were spiraled within examinees taking the one 
paper form. This design resulted in randomly equivalent groups of students taking each mode 
and form combination of the writing prompts. Since there was only one paper ACT form, the 


number of students taking the paper prompts was about half of the students taking the online 
prompts. 


Before examining potential mode effects for the writing prompts, the groups taking the 

online and paper versions of each prompt were examined to verify the equivalency of the 
groups. First, form distributions within each test center were checked so centers with large 
form count differences could be excluded from further analyses. Then, the distributions of 
student demographics (e.g., gender and ethnicity) and students’ ACT English scale scores 
were compared across the groups of students taking each of the writing mode and form 
combinations. The groups were found to be similar in terms of demographics and English test 
scores. In addition, rater ID distributions by prompt were examined. Generally, it was found that 
the same sets of raters rated the online and paper versions of each prompt, which eliminated 
the potential concern regarding the confounding of rater group and mode effect, to an extent. 
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Table 22 includes the demographic information about the writing test study sample. Figure 40 
displays the English scale score distributions for all the online and paper versions of the writing 
test. 


Table 22. Demographic Distributions of the Online and Paper Examinees 
for Spring 2015 


Online Paper 
Prompt Demographic 
N Percent N Percent 
Gender 
oa Male 475 45 237 43 
Female 575 55 317 57 
ewer Male 448 43 211 41 
Female 600 57 307 59 
Race/Ethnicity 
African American 128 12 58 10 
American Indian/Alaska Native 14 1 
White 484 46 266 48 
Form 1 Hispanic/Latino 248 24 138 25 
Asian 84 8 44 
Two or more races 60 6 31 
Prefer not to respond 30 3 17 
African American 119 11 63 12 
American Indian/Alaska Native 9 1 3 1 
White 511 49 251 48 
Form 2 Hispanic/Latino 273 26 119 23 
Asian 81 8 42 8 
Two or more races 25 2 21 
Prefer not to respond 30 3 19 
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Figure 40. English scale score distributions of the students who took the writing test 
for spring 2015 


Before conducting mode comparability analyses for the writing test, the latency information 
from the online test was also examined. Table 23 contains the descriptive statistics of the 
latency information of the two online prompts. As shown in Table 23, the two online prompts 
had similar latency information. The average time students spent on each prompt was around 
30 minutes with a standard deviation of around nine minutes. 


Table 23. Descriptive Statistics of Online Latency Information of Writing Test 
for Spring 2015 


Form N Mean SD Min Max 
Online_1 1086 29.55 8.67 0 39.88 
Online_2 1079 29.68 8.69 0 39.90 


Assuming the groups are indeed equivalent across modes, the following comparisons were 
made for each prompt to evaluate the mode effect of the writing test: (1) means and standard 
deviations of the writing domain scores, rounded average domain scores,’ total raw scores, 


1 Starting from September 2016 ACT reports the rounded average domain scores as the ACT writing test score. 


55 


m ACT Research Report Evidence for Paper and Online ACT® Comparability: Spring 2014 and 2015 Mode Comparability Studies 


56 


and scale scores?; (2) mean score differences across modes, effect sizes of mean differences, 
and t-test p-values of mean differences; (3) correlations among the domain scores, rounded 
average domain raw scores, raw and scale scores, and with English scale scores; (4) plots 

of the relative cumulative frequency distributions of raw and scale scores; (5) scatter plots of 
English and writing raw and scale scores; and (6) plots of mean writing scores conditional on 
English scale scores. As was done for the multiple-choice tests, the scale scores for the online 
writing prompts used at this stage of comparison were obtained by applying the corresponding 
paper version raw to scale score conversions. 


Results for the above analyses are presented in the following tables and figures. The 
comparison of means and the plots of the relative cumulative frequencies indicated that the 
two prompts appeared to have differential mode effects. Online students scored higher on one 
prompt than paper students, but not on the other prompt. In addition, the online writing scores 
were slightly more correlated with the English scores for both prompts. However, the higher 
correlation might be an artifact of sample size differences. 


Table 24 contains the descriptive statistics, mean differences of online and paper prompt 
scores, effect sizes, and t-test p-values. Table 25 shows the correlations among writing scores 
and with English scale scores. 


Table 24. Descriptive Statistics, Effect Sizes, and t-test p-values of English 
and Writing Scores across Modes for Spring 2015 


Online Paper 


Online-— Effect 
N Mean SD N Mean SD Paper Size t-test p 


Form 1 
Domain 1 1050 6.60 1.86 554 6.04 1.83 0.56 0.30 <.0001 
Domain 2 1050 6.40 1.79 554 5.86 1.74 0.55 0.31 <.0001 
Domain 3 1050 6.58 1.82 554 6.10 1.77 0.47 0.26 <.0001 
Domain 4 1050 7.13 1.78 554 6.57 1.75 0.55 0.31 <.0001 
Rounded Average 1050 6.76 1.79 554 6.23 1.77 0.52 0.29 <.0001 
Raw Score 1050 26.70 7.06 554 24.57 6.92 2.13 0.30 <.0001 
Scale Score 1050 19.16 6.59 554 17.17 6.44 2.00 0.31 <.0001 
English 1050 20.02 6.02 554 19.90 5.85 0.11 0.02 0.7187 

Form 2 
Domain 1 1048 5.98 1.95 518 5.94 1.83 0.03 0.02 0.7404 
Domain 2 1048 5.87 1.93 518 5.81 1.79 0.06 0.03 0.5761 
Domain 3 1048 6.15 1.94 518 6.21 1.81 -0.06 -0.03 0.5578 
Domain 4 1048 6.26 1.93 518 6.41 1.71 -0.14 -0.08 0.1339 
Rounded Average 1048 6.14 1.93 518 6.18 1.78 -0.04 -0.02 0.6832 
Raw Score 1048 24.26 7.60 518 24.37 6.98 -0.11 -0.02 0.7698 
Scale Score 1048 17.50 7.65 518 17.66 7.06 -0.15 -0.02 0.6919 
English 1048 19.68 6.07 518 19.95 6.08 -0.26 -0.04 0.4223 


? The 1-36 writing scale score is only used for ELA calculation, except in 2015-2016 when it was also reported as the 
overall writing score. 


Table 25. Correlation among Writing Scores and with English Scores for Spring 2015 


Domain Domain Domain Domain Rounded 


Raw 


Scale 


1 2 3 4 Average Score Score English 

Form1 Online Domain 1 1.00 

Domain 2 95 1.00 

Domain 3 .96 .96 1.00 

Domain 4 91 90 92 1.00 

Rounded Average 97 .96 97 95 1.00 

Raw Score .98 .98 .98 .96 99 1.00 

Scale Score .98 98 .98 .96 99 1.00 1.00 

English 53 52 53 56 54 55 55 1.00 

Paper Domain 1 1.00 

Domain 2 95 1.00 

Domain 3 94 .96 1.00 

Domain 4 92 91 .93 1.00 

Rounded Average 97 .96 97 96 1.00 

Raw Score .98 98 .98 97 99 1.00 

Scale Score .98 97 .98 97 99 1.00 1.00 

English A6 A5 45 48 A6 AT AT 1.00 
Form2 Online Domain 1 1.00 

Domain 2 .98 1.00 

Domain 3 .96 95 1.00 

Domain 4 .93 92 95 1.00 

Rounded Average .98 97 .98 .96 1.00 

Raw Score 99 98 .98 97 99 1.00 

Scale Score .98 97 .98 97 99 1.00 1.00 

English 59 57 .60 64 .60 61 61 1.00 

Paper Domain 1 1.00 

Domain 2 97 1.00 

Domain 3 95 93 1.00 

Domain 4 93 90 .96 1.00 

Rounded Average 97 95 .98 96 1.00 

Raw Score .98 97 .98 97 .99 1.00 

Scale Score .98 96 .98 96 99 .99 1.00 

English 44 42 A5 AT A5 A5 A5 1.00 


Figures 41 and 42 display the writing raw score and scale score distributions. Figures 43 and 
44 show the scatter plots of English scale scores and writing raw scores and scale scores. 


Figures 45 and 46 have the average writing raw scores and scale scores conditioning on 


English scale scores. 


57 


m ACT Research Report Evidence for Paper and Online ACT® Comparability: Spring 2014 and 2015 Mode Comparability Studies 


Cumulative Percent 


4 9 14 19 24 29 34 39 44 49 


Writing Raw Score 


Figure 41. Writing raw score distributions across prompts for spring 2015 
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Figure 42. Writing scale score distributions across prompts for spring 2015 
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Figure 43. Scatter plots of English scale scores and writing raw scores for 


spring 2015 
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Figure 44, Scatter plots of English scale scores and writing scale scores for 


spring 2015 
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Figure 45. Average writing raw scores conditioning on English scale scores for 
spring 2015 
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Average Writing Scale Score 
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Figure 46. Average writing scale scores conditioning on English scale scores for 
spring 2015 


Equating methodology was applied to remove any mode differences between the writing 
scores. Table 26 contains the descriptive statistics of the writing scale score after using the 
equating methodology. Note that except for Table 26, all writing scale scores included in the 
tables and figures in this section were from applying the paper conversions to the online 
prompts. 


Table 26. Descriptive Statistics, Effect Sizes, and t-test p-values of Writing Scale 
Scores after Applying Equating Methodology for Spring 2015 


Online Paper 
Scale Score N Mean SD N Mean SD_ Online-—Paper Effect Size t-test p 
Form 1 1050 17.08 647 554 17.17 6.44 -0.08 -0.01 0.8064 
Form 2 1048 17.10 6.51 518 17.66 7.06 -0.56 -0.08 0.1183 


Spring 2015 ACT Online Administration 

Shortly after the second mode comparability study in spring 2015, the ACT was administered 
online in another administration. More than 4,000 students participated in this ACT testing 
within a multi-day window. On one of the testing days, the two online forms used in the 
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spring 2015 mode comparability study were randomly assigned to more than 1,800 students. 
The sample sizes of the two forms were 890 and 922, respectively. This provided additional 
data for examining the scale score distributions of the two online forms between randomly 
equivalent groups. 


With randomly equivalent groups of examinees taking each form, the distributions of scale 
scores across forms are expected to be very similar. The equivalency of the score distributions 
of the two online forms from these data was examined. In addition, the distributions of the 
online form scale scores were also compared with their distributions in the spring 2015 mode 
comparability study. 


Figure 47 presents plots of the relative cumulative frequency distributions of the scale scores 
of the two online forms. It shows that the scale score distributions were similar across the two 
forms for all subjects. Figure 48 has the distributions of the two forms in the spring 2015 mode 
comparability study added as a comparison, which shows obvious differences between the 
samples of students in these two administrations. Even though examinee proficiencies were 
different in the mode comparability study and this subsequent administration, the distributions 
of scale scores of the two forms were similar within each administration (i.e., the mode 
comparability study and the online administration). These results provided further support for 
the stability of the online form conversions obtained from the mode comparability study when 
equating methodology was applied. 


Mathematics 


1 6 11 16 21 26 31 36 1 #6 11 #16 21 26 31 «36 
Scale Score 


Form 1 ——-— Form 2 


Figure 47. Relative cumulative frequency distributions of scale scores for online 
testing 
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Figure 48. Relative cumulative frequency distributions of scale scores of online 
testing compared with their distributions in the spring 2015 mode comparability study 


Conclusion and Discussion 


The spring 2014 ACT mode comparability study was the first time that the ACT was 
administered online for operational purposes. The online versions of the reading and science 
tests had an extra five minutes compared with their paper versions, a decision made based on 
the results from the fall 2013 timing study that showed evidence of speededness for those two 
tests. The spring 2014 mode comparability study examined both item and test level differences 
across online and paper versions of the multiple-choice tests. Results showed that very small 
differences were found between the two modes in terms of test reliability, correlations among 
tests, effective weights, and factor structures. However, item and test level scores tended to be 
slightly higher for the online group than for the paper group. Equating methodology was used 
to adjust for the differences so that scale scores from the two administration mode versions 
were comparable. 


To minimize the potential between-mode differences in forthcoming online administrations, 

the online timing issue was revisited based on the spring 2014 mode comparability analyses 
results, an examination of the item latency information of the two online forms, and the student 
survey results regarding whether they thought they had enough time to finish each test. 


Taking into account results from all these analyses and the changes that had occurred to the 
online administration platform between the fall 2013 timing study and the spring 2014 mode 
comparability study, it was concluded that, going forward, the extra five minutes for the online 
reading and science tests should be eliminated. 


Another mode comparability study was conducted in spring 2015 in which the online and paper 
administration time were the same for all tests. Results from the spring 2015 study showed 
that the mathematics and science test scores were relatively more comparable between 

online and paper administration modes, while the English and reading scores tended to be 
slightly higher for the online versions. The ACT writing test with the enhanced design was 
administered along with the multiple-choice tests in spring 2015. Online scores on one prompt 
tended to be higher than the paper scores, but similar to the paper scores on the other prompt. 
Equating methodology was applied to link the online forms to their paper counterparts for both 
the multiple-choice tests and the writing test in spring 2015. 


In addition to supplying the data for the analyses in this report, the mode comparability 
studies and the earlier timing study also provided ACT with valuable experience in online 
administration of the ACT. During the studies, feedback from students and test administrators 
were collected. Besides questions on the sufficiency of testing time, students were also 
asked various other questions concerning their preparation for the online testing, computer 
experience and typing skills, easiness of navigation and use of various features of the online 
test, their use of scratch paper, and their preference of the testing mode. Test takers were 
also asked to provide any additional comments that they might have regarding their testing 
experience. Though some students experienced difficulty during the online testing mainly due 
to technology issues, a larger proportion (53% in spring 2014 and 45% in spring 2015) of the 
students who took the online tests expressed preference of online testing over paper testing 
than did those who expressed preference of paper over online (33% in spring 2014 and 26% 
in spring 2015). Analyses of the feedback, together with experiences gained in dealing with 
the various issues encountered, are valuable resources that ACT can utilize in creating optimal 
online testing experiences for examinees while maintaining the comparability of scores to the 
paper versions for future online administrations. 
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ACT is an independent, nonprofit organization that provides assessment, 
research, information, and program management services in the broad areas 
of education and workforce development. Each year, we serve millions of 
people in high schools, colleges, professional associations, businesses, and 
government agencies, nationally and internationally. Though designed to 
meet a wide array of needs, all ACT programs and services have one guiding 
purpose—helping people achieve education and workplace success. 


For more information, visit www.act.org. 
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